Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-127541: Update os.walk example #127765

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

totosuki
Copy link
Contributor

@totosuki totosuki commented Dec 9, 2024

I updated the example to exclude __pycache__ directories instead of CVS.
While the issue suggested replacing CVS with .git, a comment pointed out that .git is usually found only in the root directory, making it not equivalent to CVS, which can appear in multiple subdirectories.
On the other hand, __pycache__ directories often appear across various levels of a directory tree, making them a more practical modern example!


📚 Documentation preview 📚: https://cpython-previews--127765.org.readthedocs.build/

Copy link

cpython-cla-bot bot commented Dec 9, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

Doc/library/os.rst Outdated Show resolved Hide resolved

import os
from os.path import join, getsize
for root, dirs, files in os.walk('python/Lib/email'):
print(root, "consumes", end=" ")
print(sum(getsize(join(root, name)) for name in files), end=" ")
print("bytes in", len(files), "non-directory files")
if 'CVS' in dirs:
dirs.remove('CVS') # don't visit CVS directories
if '__pycache__' in dirs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to update which directory is being walked over (currently it's python/Lib/email)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! I’ll update the example to use os.getcwd() for better compatibility.

Copy link
Member

@picnixz picnixz Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending where you are actually running the command, it might be very very long for the user (say they run it from $HOME). Instead, we can walk over 'python/Lib' instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback! I’ve updated the example to use python/Lib as the base directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason of this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since __pycache__ is more common throughout python/Lib than python/Lib/email, I updated the directory to make the example more representative and practical.

Copy link
Member

@picnixz picnixz Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, I only have a single __pycache__ in my email package (even though email.mime exists) so you wouldn't see the "recursion" in this example I think if you compare with a find Lib/email -not -ipath '*__pycache__*'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting from Lib you will walk into .git, which may be non-desirable. Lib/xml contains several subdirectories with their own __pycache__ directories.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah maybe, Lib/xml is better then.

@totosuki totosuki requested a review from picnixz December 9, 2024 16:25
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@totosuki totosuki requested a review from picnixz December 9, 2024 22:06
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a better example. But look also at the fwalk() example and docstrings.

@bedevere-app
Copy link

bedevere-app bot commented Dec 9, 2024

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@totosuki
Copy link
Contributor Author

I have made the requested changes; please review again.

@bedevere-app
Copy link

bedevere-app bot commented Dec 10, 2024

Thanks for making the requested changes!

@serhiy-storchaka: please review the changes made to this pull request.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.


import os
from os.path import join, getsize
for root, dirs, files in os.walk('python/Lib/email'):
print(root, "consumes", end=" ")
print(sum(getsize(join(root, name)) for name in files), end=" ")
print("bytes in", len(files), "non-directory files")
if 'CVS' in dirs:
dirs.remove('CVS') # don't visit CVS directories
if '__pycache__' in dirs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting from Lib you will walk into .git, which may be non-desirable. Lib/xml contains several subdirectories with their own __pycache__ directories.

@totosuki
Copy link
Contributor Author

Thank you!
You're absolutely right that going into .git isn't ideal. I'll update it to python/Lib/xml now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting merge docs Documentation in the Doc dir skip news
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

3 participants